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Abstract 

This paper presents a backfitting-type method for estimating and forecasting a periodically correlated par- 
tially linear model with exogeneous variables and heteroskedastic input noise. A rate of convergence of the 
estimator is given. The results are valid even if the period is unknown. 



Resume 

On utilise une procedure iterative de type backfitting pour estimer les parametres d'une classe de mod- 
ules partiellement lineaires periodiquement correles presentes pour modeliser revolution de la consommation 
d'electricite. On obtient une vitesse de convergence des estimateurs et un intervalle de prevision de consom- 
mation. 



keywords: a-mixing, additive models, backfitting, electricity consumption, forecasting interval, semipara- 
metric regression smoothing. 



1 Introduction 

In this paper, we focus on partially linear models of the type 



p q 

X n = ^ a j X n-j + X] b i( e n-j) + a ( e n^ e n -q') s n- (1-1) 
j=l 3=0 

The parameters p > 1 and q > are supposed known while the coefficients aj as well as the functions bj and a 
are unknown. The sequence (e„) is an unobserved system noise. The aim is to predict X n+ h, for some h > 1, 
from {{X n , e n ), (X n _i, e„^i), . . .), the observed set of past values available at date n. 



During the last 20 years, partially linear autoregressive models such as (1.1) have gained attention, as being 
a good compromise between linear models and purely non parametric ones. Such models, proposed in [S] to 
represent the relationship between weather and electricity consumption are now widely used in the literature. 



See for example [15] where a chapter is devoted to models including (1.1 1. The functions bj are expanded on 



a suitable basis and the first coefficients of this expansion, together with the a^'s, are estimated via a L.M.S 
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method. See also [T5J . With the same type of partially linear models [SJ [TO] use wavelets in the estimation 
scheme. In [JJ, the b/s are treated as nuisance parameters. Let us also mention [ITJ [T2 H2] , devoted to models 
including purely autoregressive ones, where some past values operate in a linear form and the others in a 
functional one. These authors use an orthogonal series method, and propose a data based criterion to determine 
the truncation parameters. See also chapter 8 in [S], where models like 

X n +i = f{X n ) + aX n -i + <r(X n )e n . 

include linear and non linear autoregressive summands together with some volatility. The functional parts are 
estimated via local linear estimators and gaussian limits for the renormalized errors are obtained. 



Model (1.1 ) presents several advantages. Firstly, the additive form reduces the so-called curse of dimension- 
ality. Secondly, linear autoregression is preserved when expressing the future values (A"n+ft.)fc=i,... from the past 
ones (X n -h,e n -h)h=o,...i which makes it easier, and in some sense coherent, forecast at lags greater than 1. 
Lastly, model ( |1.1[ ) is specially well adapted to the situation where the output X n is electricity consumption at 
date n and the input e„ the temperature at the same date, since it is well-known that the effect of temperature 
on electricity sales is highly non-linear at extreme temperatures, while linearity of the autoregression seems to 
be a reasonable assumption. Notice that, in practical situations, the temperature at date n is either measured or 
forecasted by Meteo- France. In both cases, the value of the exogeneous variable e„ is known. Accurate electrical 
load forecasting is essential for power utilities. Electricite de France (EDF) performs a climatic correction. The 
influence of temperature on electric demand is widely reported. Other extra so-called exogenous variables are 
included in the short-term models. They may be random variables like wind speed, or deterministic ones like 
"position-within-the year of the date" which is a year-periodic variable. With those variables, for horizons up to 
3 days by 1/2 hourly steps, the forecasts are very efficient when based on nested models studied during many 
years. 

For simplicity and convenience, we only consider in this paper the situation q = q' = 0, leading to the model 

X n = axX n _ x . . . + a p X n _ p + b{e n ) + a{e n )e n , n e Z. (1.2) 

The algorithm presented below can easily be adapted to the general case q, q' > 0, and the results of theorem [2] 
still hold with a loss of speed if b or a have non-additive forms. 

1.1 Elements of discussion 
1.1.1 Backfitting 

Backfitting methods, first proposed by [3], are usually recommended for additive models which involve several 
explanatory variables, each having an unknown functional form. The method is well described in [E1[T7]. See 
also pj HU 122] where the estimation algorithms use local polynomial regression and [2D] based on projections 
on polynomial spaces. The performances of backfitting procedures when autoregression is involved are less well 
understood. In |28j . for the non linear stationary autoregressive model with exogeneous variables 

X n = a(X n -t) + b(e n ) + e n , 

the algorithm works in two steps: the first step builds a preliminary estimator of a et b by piecewise constant 
functions. Then, from the obtained pseudo remainders, the second step builds kernel estimators of the same 
functions. The author obtains the limit law for the estimation error. 



For the model ( 1.1 1, if the period T is known, a simple estimation scheme would consist of splitting the data 
in T subsamples, each of them being a trajectory of a stationary process. Then the parameter 9 = t {a\, . . . , a p ) 
and the function 6() could be estimated separately, the first one at the usual parametric rate, and the second 
one at the slower usual functional rate (see [HI 023 f° r remarks on this question). The choice of a backfitting 



scheme for estimating (1.1) presents the advantage of allowing the period of the input sequence (e n ) to remain 



unknown. As it will be proved below, the price to pay for this is a slower rate in the estimation of 9. Note that 
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simulation studies seem to indicate that the iterative method presented below still works even when the period 
shows slight variations. Within the backfitting iterations, a kernel based statistics estimates the functional part 
of the model. Other methods could have been used here (local estimators, splines, wavelets for instances). 
Usually tuning the bandwidth, through a cross-validation process, enhances the estimators' quality. We haven't 
studied that point for two reasons: no theoretical results are available and we wished to study the bare quality 
of the basic estimators. The underlying questions are postponed to a future paper. 

1.1.2 Parameters p and q 

It could be interesting to estimate the orders p and q of the autoregression and regression parts. In a first 
approach, we suppose that these parameters are known. In fact, in the particular situation of forcasting 
electricity consumption, these parameters have been widely studied and are supposed to be known. The order 
p is large, but the characteristic polynomial has only few non-zero coefficients, so that the coefficients dj are to 
be estimated under constraints. Our convergence results can easily be extended to that sort of situation. 

1.1.3 Comments on the results 

Sections [i] and [4] hereafter mainly consists of asymptotic results. These results are formulated as 9 n — 9 = 0(u n ) 
for a sequence u n going to zero. Several complements are missing: 

• Is the rate u n exact? 

• If that is the case, how do we get an idea of the constant in 0{u n )7 

• What about the limit distribution of the re-normalized error? 

The almost sure (a.s) convergence is the only type of studied convergence. No central limit theorem is included. 
The usual developments of expectation and variance, even when they are included, are not brought forward. 
The a.s. convergence is all that is needed to compute the forecast interval as far as the asymptotic interval is 
concerned. The innovation distribution quantiles are all that we need. For the last question, a complementary 
study is in progress, in order to obtain gaussian limits as it is the case in this kind of studies (see for example 
[5]). The section here devoted to simulations attempts to answer the first questions. See for example, Figure [5] 
and the comments in section T5. 21 



2 Estimation of the parametric and non parametric components 

The aim is to estimate the functions b(.) and tr(.) and the vector parameter 

6 = t (a 1 , . . .,a p ). 

Denoting 

the model can be written 

X n = t cj> n e + b{e n ) + a{e n )e n . (2.3) 
We choose a kernel K, and a smoothing parameter h n . 

Having chosen initialised estimation of 9 and a stopping rule, the iterative method consists of estimating 9 
(resp. b) by using an estimation of the residual calculated from the previous estimation of b (resp. 9). 

• Initialisation. Fix the first value 9^ 
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Step 1. Estimate the function b by a kernel estimator based on the partial residuals 



= P + 
where 

K n (e) := K 



e 



• Step 2. Update the estimation of 8 by a least mean squares estimator based on the new partial residuals 

n 

0^ = Argmin, £ (X t &8 - b^^ jf 



n-1 



with 



l=p+l 



£„ = ^ (2.4) 

i= P +i 

Finally, the transition from step k — 1 to step fc can be expressed as 

6 « ^ = — U-i „ ; , (2-5) 

n 

6^ = S^ 1 MXi-b^id)). (2.6) 

Chosing a stopping time A: for the iterations, the variance c 2 (e) is then estimated by a kernel method 
using the partial residuals based on the estimates and b^ -1 ^ 



EKh (*J "* ) - ^ _1) (e;)) 2 K n (e - e,) 



J2i=p+i K n (e-e/) 

As in the case of linear regression, estimating and 6 does not need any estimation of a, implying that er„.fc is 
obtained at the end of the iterative scheme. See [5] for remarks on this so-called oracle effect. 

3 Main results 
3.1 Hypotheses 

We adopt the following basic hypotheses (H). 

• Hi: Periodicity. The exogeneous sequence (e n ) is the sum of a periodic deterministic sequence (s n ) and 
a bounded zero-mean strong white noise 

e„ = s n + rj n \fn (3.8) 
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• T-L2'- Whiteness of the system noise. (e„) is a bounded i.i.d sequence of zero-mean variables, and Var(e„) = 
1. 

• W3: Stability. The autoregressive dynamic is stable. In other words, the polynomial 

A(z) = 2?- ^"• - / ' j 

does not vanish on the domain \z\ > 1. 

• Hi: Independence of the inputs. The two sequences and (r) n ) are independent. 

• "Hs: On the distributions of input sequences. The distributions of £\ and 771 both have a density. The 
density / of 771 is continuous and non-vanishing on the support [— m v , m^] of 771. The density g of E\ is C\ 
and never vanishes on the support [— m ei m e ] of £\. 

• Hq: On the functions . Let £ = L)J =1 [sj — m n , Sj +m v ] denote the union of the T compact supports of the 
variables ej. 



1. The function b is 7-Holderian on £ , for some < 7 < 1, which means that 

Hex) - b(e 2 )\ 



sup 

i,e 2 e£ |ei-e 2 | 7 



< 00 



2. The variance <J 2 (e) of the input noise is 7 1 -Holderian on £, for some < 71 < 1, and 

inf cr(e) > 0. 

• T-ls- On the kernel . The kernel K is lipschitzian, and satisfies 

K(u)du = 1 



(3.9) 



(3.10) 



Keeping in mind the example of electricity consumption, hypothesis "Hi allows some periodicity in the 
random structure of the input sequence (e„). Boundedness of the noises (hypotheses T-L\ and H2) is 
assumed only to have shorter proofs. Without this boundedness, the uniform speed in Theorem [2] only 
holds on compact sets. Hypothese assures that the denominators of the kernel-type estimators of b 
and a are not asymptotically vanishing. The Holder exponents 7 and 71 in hypothesis He, govern the 
convergence rate of the estimation scheme. 



In what follows, we work with the periodically correlated solution of (2.3) defined in section 6.1 



3.2 Existence of 0% 



(fe) 



The lemma below establishes that, almost surely, the matrix £„ = Xw=p+i 4 >t i ( i ) i appearing in (2.4) and used in 
estimating the parameter 6 is invertible at least for large enough n. 



Lemma 1. Under the hypotheses Hi 234, the matrix E„ being defined in (2.4-), as n — > 00, 

(i) 

T-1 , T-1 



S 



;=o 1=0 



M «V (0 + r 
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whe 



and where fi^ = E(0g ) ) and is the covariance matrix of <f>Q ^ . 
(ii) The limit matrix M is regular. 

The proof is in the Appendix. 



(3.11) 



3.3 Analysis of estimation errors 

We first focus on the estimation errors 

§W = 9 - §W and U n k -V ( e ) = 6(e) - (e). 



From (2.3), 



= -S- 1 £ i (6l fc - 1 )(e i )+a(e i ) £i 
/=p+i 

EL P +i (- t 0/^~ 1) + 6(e) - 6(ei) - <j( ei ) £l ) K n (e - e,) 



Thanks to the linearity of 6« (e) with respect to l; , this leads to the linear recusrive equation 



(k-i) 



where 



5(*0 - a 9 {k ~^ + /?W 4- /?( 2 ) 



, v _i | , ^j^n (e; - e,-) 



E"=p+i ( & ( e i) - &(ej) + o-(ej)ej) # n (ej - e.,) 



EL P+ i (ej - ej) 



3.4 Convergence results 



(3.12) 



(3.13) 



(3.14) 
(3.15) 

(3.16) 
(3.17) 
(3.18) 



Considering (13. 151) , we are going to prove that, as n — > oo, the matrix operator A n converges to a strictly 

. . n-n . . • ~(k) 

shrinking one. As emphazised in [3] this is the key result implying that 9 n stabilizes as k increases. Then we 
prove that the remainder term ii„ + R^f 1 tends to zero, which implies that the stabilizing value 9^n°^ in turn 
vanishes when n — > oo. This leads to the main result, whose detailed proof is in the Appendix. 



Theorem 2. With the assumptions of section \3. 1\ if the smoothing parameter is such that, as n — > oo h n 
vP 1 (Inn)'' 2 , there exists f3 €]0, 1[ such that 



5(fc) 



2 

m 



sup ee£ \b { n >{e) - 6(e) | 



= O a . s . 



lnr? 

nhr. 



+ o a . s .(hZ) + o a . s .{p k ) 



(3.19) 



G 



sup \al k (e) - a 2 (e)\ = O a , s . ( \f^f] + O.,!^ 7 ' 7 ' 1 ) + O a . s .(P k ) 
e <££ \ V nh n \ 



(3.20) 



where the 0(.) 's are uniform with respect to k and n. 



We see that the convergence rate of er 2 k (e) cannot exceed that of the other parameters and can even be 
slower when b(e) is smoother than <r(e). The equality (3.201 is proved in the Appendix. 
As a result, an optimal rate is obtained by chosing convenient values for fi\ and f$%. 

i 

Corollary 3. Under the same hypotheses, if h n ~ (lnn/n) 27+1 , there exists (3 €j0, 1[ such that 



i k) -e\\ 2 1 „ /In 



sup ee£ \tfn\e) - 6(e) 

and 



i n 



= 0«...( — J +Oa. S .(/3 fc ) 



min{ 7 , 7 '} 

sup|«r^( e )-a 2 ( e )| = O a . s . ( — ) 27+1 +O a . s .(/3 fc ) 

It is clear that, provided j3 is not too close to 1, the convergence of the term /3 fe to zero is fast. In other 

words, stabilisation of the iterations is easily obtained while convergence of (^ 1 ) 2 ~ l+1 to zero requires large 
sample size. More precisely, taking k = k(n) > Chin gives 

Corollary 4. Under the same hypotheses as in Corollary^and with h n ~ (lnn/n) 2 ^ +1 , if the recursive scheme 
stops after k(n) > Clnn iterations 



sup ee£ |e (n)) ( e )-6(e)| 



O a . 



suv\ol,k( n ){e)-o 2 {e)\ = O a . s . f — \ 



lnn\ 2 "'+ 1 
n 



lnra\ 2 -> +1 



Remark 1. With the above remark in mind, it is interesting to note that, when the autoregression is close to 
the instability domain, the value of (3 can approach 1. In such situations, a large number of iterations is needed 
before the stabilisation of the iterative scheme. For example consider the particular model 

X n = aX n -i + b(e n ) + e n 

where the sequence (e„) is i.i.d. From Lemma |8j 

nx n f _ 1 



where cr(0) = c 2 /(l - a 2 ) and E(X n ) = c'/(l - a). Hence, 

A = y 1 if a ->• 1. 

Consequently, the iterative scheme can be very slow if a is close to 1. On the opposite, when a is close to — 1, 
the iterations stabilize very quickly. 
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3.5 Improvement of the rate for smooth functions b 

As well-known in functional estimation, a smoother b induces, with some extra conditions on the kernel K, a 
better rate of convergence of the estimators. 

Corollary 5. If the function b is Ci for some integer I > 1 and if the kernel satisfies 

J e k K(e)de = \fkel,...,£ (3.21) 

J K(e)de = 1 (3.22) 

(i) if the smoothing parameter is such that, as n — > oo ; h n ~ n^ 1 (lnn)^ 2 there exists (3 (E]0, 1[ such that 

\\W-n* = o a . s .(J^j+o(hi)+o a . s xf3 k ) 

sup|&W(e)-&(e)| = O a . a . t\f^P\ +0{hi) + O a . s .{P k ) 

e££ V V nhn J 

l 

(ii) if h n ~ (In n/n) 2e+1 the rate of the two first terms is optimal and becomes 

O a . s . 



lnr, 
n 



The proof, based on the fact that, using (3.21 ), J (b(vh n + e) — b(e))K(v)f(vh n + e)dv = 0(h n ), is omitted. 



4 Forecasting intervals 

The natural predictor for A n+1 , 

K(X n+ i\e n+ i,e n , . . . ,ei,X n , . . . ,Xi) = t (j> n +iO + &(e n+ i) 

can be evaluated via the estimates of 9 and b based on the observations up to time n. In other words, we 
propose the predictor 

X n +1 — 4>n+l6n + bn(e n+ i). 

It should be clear that, under the conditions of Corollary [3j 

X n +i — X n+ \ c 
— ~i \ > £l > 

0~n{en+l) 

and, consequently, building a prediction interval requires an estimation of the noise's quantile function Q(t). 
The inverse of Q can be consistently estimated by 



n-l 



j = l | o n ('j + l) I 

based on the set of retroactive predictions Xj + i^ n — (f>j^i6 n + b n { e j+i)i j < n — 1. which use the estimates 
available at time n. 

Summarizing, for X n+ \ we obtain the prediction interval at asymptotic level a 

X n+ i - a n (e n+ i)Q n (a) , X n+1 + a„ (e n+1 )Q n (a) 
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5 Simulation examples: results and comments 



Several aspects of the present paper, but not all, rely on the EDF modeling-forecasting process. Some are 
theoretical and some others practical. Let us mention some of them which are of interest when performing the 
data simulation part. 

• Question 1. What is the influence of a single irregular point of the function b on the regular points estimators? 

• Question 2. The results are proved when both k — > oo and n — > oo. Could we use a simplified procedure base 
on one iteration k = 1 of the inner loop? Do we actually get benefit from the iteration process? Remember we 
do not use any preliminary estimator. From a practical point of view, can we get any information linking the 
stochastic process dependencies to a good value of kl 

• Question 3. When the EDF engineers estimate a model, the question of the sample size n is a recurrent one. 
A sample could be said to be large when either its cost is high or when the distance of the statistic distribution 
(computed with n observations) to its limit distribution is small. Theorem [2] and Corollaries [4] and [5] offer a 
speed of convergence where the constants, as often, are missing. 

The simulations answer some of these questions. 

Three type of autoregressions are chosen, two of order one and one of order 4. 

1. An AR1 process with a positive coefficient 

X n+1 = 0.7X n + b(e n ) + a(e n )e n+1 (5.23) 

2. An AR1 process with a negative coefficient 

X n+ i = — 0.7X n + b(e n ) + d(e (5.24) 

3. An AR4 process 

X n +\ = aiX n + a 2 X n -i + a 3 X n - 2 + a4A n _ 3 + b{e n ) + a(e n )e n+ i (5.25) 
where the roots of the characteristic polynomial are ±0.5 and 0.5 ± 0.25. 
The functions b and a are the same in all examples 

6(e) = ^e|, CT ( e ) = l + |*. 

The input white noise e n has a standard gaussian distribution, and the exogeneous e n = s n + rj n where s n is a 
6-periodic sequence with sq = —1.2, s± = 3.1, s 2 = 1.80, S3 = —2.51, ; S4 = —3.2, S5 = —0.25 and where the 
noise rj n is i.i.d. with marginal distribution uniform on [—3, +3]. 



5.1 Three examples 



For each of the three models (5.23), (5.24) and (5.25), a trajectory of size n = 5000 is simulated and the 



estimations of the parameter 9 and of the functional parameter b are carried over through a number k of 
iterations varying from 1 to 50. Having reached the last iteration, the estimation of a 2 is then calculated. The 
kernel K is the gaussian kernel and the smoothing parameters h n and h' n , used in the estimations of b and a, 
are 



hn = 1.5S e n- 1/2 and h' = 0.155 e 



,-1/3 



(5.26) 



where S e is the empirical standard deviation of the ej 's. 

The results are depicted in Figures [l] [i] and [ij The upper-left graphic shows the function 6(e) = \/[e[, 
its estim ate a fter 50 iterations together with the cloud of partial residuals used to calculate the estimate (see 



formula (2.5)). The upper-right graphic shows b and the evolution of its estimations b n as k varies from 1 



to 50. The lower-left graphic shows the evolution of the estimator of the AR parameter 9 as a function of the 
number k of iterations, and the lower-right one presents the standard deviation cr(e) and its final estimation. 
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5.1.1 Model ( |5.23[ ) 

Four main effects arc noticeable. 



• As the number of iterations increases, the estimates of b and of 9 improve. 

• The iterations stabilize very slowly. This is not surprising since the value of the parameter 9 = 0.7 is close 
to 1 (see Remark [I] just after Corollary [3]) . 

• As expected, for fixed k, the convergence of bn (e) is far worse in the neighbourhood of e = 0, discontinuity 
point of b' . This effect is even still visible for the estimator of a(e) (lower- right graphic), despite the 
smoothness of this function at this point. 



5.1.2 Model ( |5.24[ ) 

Compared with the first example, there are only two differences 

• The iterations stabilize quickly (4 iterations are enough), due to the fact that 9 = —0.7 is close to —1, 

• But the obtained limit value of 9^ is not very close to the true value, meaning that in this case, more 
observations are needed for a good estimation. However, the estimate of <r(e) seems quite good. 



5.1.3 Model ( |5725[ ) 

In this example 9 has 4 components. They are indicated, in the lower-left graphic, by 4 horizontal lines. The 
stabilisation point of the iterations is between those obtained in the two other examples (40 iterations are 
enough), perhaps due to the presence of the positive root 0.5. The sample size is large enough to get good 
estimations. It seems that the order of the autoregression, at least for moderate orders, has no significant effect 
on the quality of the method. 



5.2 Evolution of the estimation errors as functions of the sample size 



In the two last sections, we take model (5.24 1 and we simulate sample paths for sizes going from 200 to 10000. 
For each sample path, the estimations of the three parameters 9, b and a 2 and of the distribution of the noise 
are computed, based on k = 20 iterations. Then the estimation errors are calculated. Except for the error on 
9, we compute three sorts of errors, based on L\, L 2 and norms: 

• For the functional parameters b and a, denoting by d the length of the domain of e, we choose 



N x (h) = ~J \h(e)\de, N 2 (h) = ^ J h 2 {e)de and N^h) = \/d||/i|U 
which satisfy Ni < N 2 < 

• For the noise distribution, we compute the total variation, the Hellinger and the Kolmogorov distances. 

Moreover, in order to reduce fluctuations, we simulate fifty independent trajectories for each sample size, and 
compute the average of the errors obtained from these trajectories. 

The averaged errors are presented in Figures [4] and [5] which show, from top to bottom and left to right, the 
error on 9, b, er, and on the noise distribution (three curves in each of the three last graphics, corresponding to 
different distances). The abscissa is the sample size n. 

Figure [4] presents clearly the fact that the convergence to zero of all the errors becomes very slow when n 
is larger than 2000, meaning that the asymptotic speed (lnn/n) 1 / 4 is reached. Errors seem to quickly decrease 
for small sizes. 
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Figure [H] is a log log set of graphics. The (nearly!) straight lines represent c(ln?i/n) 1 / 4 for five values of c. 
Except for the error on the noise distribution, which decreases faster, the theoretical bound vT 1 ^ (see Corollary 
[4] with 7 = 1/2) looks exact. 



5.3 Stopping time for the iterations 

We chose to stop the backfitting iterations when the estimations arc stabilized: namely, after the first k such 
that 



< icr 



should be of order In n, 



Let us denote by k(n) the obtained stopping point. As pointed out in Corollary [4j k(i 
hence hardly varying in the domain n < 1000. 

For each model, and each sample size n, five independent trajectories are simulated. This is illustrated in 
Figure [6j for the three models ( |5.23| , ( |5.24| , ( |5.25|) The sample size varies between 100 and 1000. There are 
three groups of 5 piecewise linear lines. Model (5.24) is represented by the lines in the lower part of the graphic. 
For this model, the stopping point is almost constantly equal to 7 and 8. Models (5.25) (darkest lines) and 
(5.23) occupy the upper part. This illustrates the asymptotic theory and the observations in Figures [I] [2] and [3j 
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Estimation of b(e) 



Iterations of bhat(e(, K= 50 




^^^^^^^^^ 

-4-2 2 4 

e 

bhat{e)and bis). pcirls=Partial Residuals 

Estimation of AR param. 








^^^^^^^^^^^^^^^^^^^^ 


- — sf^m 
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1 1 1 1 1 

-8-4 -2 2 


4 a 






Estimation of sigma(e) 






Figure 1: Estimation results for model (5.23|. The trajectory length is n — 5000 and k — 50 iterations are 
performed. The upper-right figure shows the evolution of b^ k \e) for k = 1 : 50. The lower-left figure presents 
the evolution of the estimator of 9 for k = 1 : 50. The iterations stabilize very slowly, but the size of the sample 
is enough to obtain good estimation. The lower-right figure presents the estimator of cr(e) for k = 50. 
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Estimation of b(e) 



Iterations of bhat(e}, K= 50 





bhHt{e)and b;E>, points=Partial REsidugls 

Estimation of AR param, 




Estimation of sigma(e) 




Figure 2: Estimation results for model (5.24 1. Few iterations are needed, but the size sample seems to be too 



small to obtain a good estimation of 9. Nevertheless, the estimation of the functions looks satisfactory. 
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Estimation of b(e) 



Iterations of bhat(s}, K= 50 





bhatiE) and t>(e), pDints=Paitial Residuals 

Estimation of AR parara. 




Estimation of sigma(e) 




Figure 3: Estimation results for model ( |5.25 1. The coordinates of 9 are the horizontal lines on the lower-left 
figure. The iterations converge slowly (40 iterations), and the estimations are rather good. 
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teta estimation error ; K= 20 



b estimation error ; K= 20 
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200<= n <=10000 



2000 4000 6000 SOOO 10000 

200-= n <=100Q0 



Figure 4: Estimation error as a function of the length n of the trajectory. The model is (5.24). From top 
to bottom and left to right: errors on the estimation of 6, b, a, and the distance between the estimated 
distribution of the noise and the Gaussian distribution. In abscissa, the two first values are 200 and 500. Then, 
the lag remains equal to 500. In graphics 2 and 3, the positions of the three curves are conform to inequalities 
Ni < N 2 < iVoo. For the lower-right figure, the distances are (top to bottom) are Total- Variation, Hcllingcr 
and Kolmogorov-Smirnov distances. 
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Figure 5: For model (5.24), the graphics present, in log-log coordinates, the error in estimating the parameters 
9 (top-left), b (top-right) and a (bottom-left) and the distance between the estimated distribution of the noise 
and the Gaussian distribution (bottom-right). The straight lines (almost straight, because of the term Inn) 
show the curves c(lnn/n) 1 / 4 for several values of c. 
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Selected k on max dist(errorteta),dist(error b); levels=( D.1 % 0.1 %) 




6 Appendix: Proofs 



6.1 Preliminaries about the process (X n ) and its covariances 



We consider the solution of (1.2 1 defined by the MA^ expansion 



X n = ^2 9j i b ( e n-j) + (r(e n -j)e n -j) n e Z (6.27) 

J>0 



where the geometrically vanishing sequence (gj) is denned by 

1 



1 — a\z — . . . — a n z p 

Since the sequence s n is T-periodic, and r\ n is i.i.d., the T-dimensional vector sequence Z% — 1 [X^t, ■ ■ ■ , Xrk+i)T—i J 
is a strictly stationary process, each coordinate being the sum of T linear scalar processes based on T indepen- 
dent white noises. In other words, the process (X n ) is periodically correlated (see for example [TH] for a review 
on periodically correlated time series). 

Hereafter, the stationarity of Zk is the key for proving convergence results via the law of large numbers. 

6.2 Proof of Lemma Q] 

The proof consists in separating the sequence ((f>k) into the T stationary and ergodic subsequences {(4> k ^)\l = 
0, . . . , T — 1} defined in (3.11 ) and using the law of large numbers. Details are omitted. 

To check regularity of the limit M, consider the vector sequences (ipk) and (i>^) built from 

J>0 

exactly as (4>k) and (<j) k ) are built from (X n ). Similarly, consider the sequences (ip' k ) an d built from 

l 7 ,' = J2j>o 9jH e n-j)- Denoting by F'^ the covariance matrix of (ip'^), and noticing that the sequences (ipf.) 
and (ip' k ) are orthogonal, 

r« =r'« + E(4 i)t 4 i) ) 

Hence, if M is singular, the same holds for Yli=o ^ yPo^^oj ■ This in turn implies that there exist (ci, . . . , c p ) 
such that, for every k, 

ci^. H h c p Sk-p+i = as (6.28) 

where Sk = Yfc + • • • + Yfc-T+i is the sum of the Y's over a period of the input e„. Now, it is clear that Sk is a 
stationary ARMA process having the representation 

T-l 

Sk = aiSk-i + ■ ■ ■ + a p Sk-p+i + ^ a{ek-j)£k-j, 



where, from (3.10), the variance of the noise is not zero. This contradicts (6.28). 
6.3 Proof of Theorem [2] 

Most proofs below are classical in the field of kernel functional estimation. This is why some details are omitted. 
The reader can refer to [2J, [5] or [5j for complete devlopments. 
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6.3.1 Convergence of Rn^ and Rn^ 



Lemma 6. Under the assumptions Hi ..,8; o,nd if the smoothing parameter h n is such that h n ~ n^ 1 (Inn)^ 2 
with some j3\ < we have 



- O 



Inn 

nh n 



o{K) 



Proof. Given the convergence of £„/n to a regular matrix, it is enough to prove the wanted result for 

1 A A E?= P+ i (Hej) - Kei) + <r(ej) ej ) K n (e, - e 3 ) 

- 2. h 

i=p+i 

We prove the uniform convergence 



E"=p+1 Kn ( e l ~ e 3 



.29) 



sup 



Ej=p+i »( e j) - b (e) + cr(ej)gj) K n (e - e 3 ) 



O ., 



Inn 

nh n 



.30) 



The result will then follow from the fact that, thanks to the law of large numbers applied to each subsequence 



/l fe'')fci (fc = 0, . . . ,T — 1), the arithmetic mean n 1 Ep+i $ k almost surely converges 



In order to prove (6.30) we only consider 



E?= P+ i iKej) - 6(e)) K n (e - ej) ^ E^p+i (&fo) - He)) K n (e - ej ) 



Ej= P +i K n (e 



□ 



(6.31) 



The treatment of the other part in (6.29) is simpler since E(a(ej)sjK n (e — e^)) = for every j. 

• Consider first the numerator of (6.31 1 conveniently split in two parts: a variance term and a biais trem 



1 - 

N ^ = Z ( b (e j )~b(e))K n (e-e J )-E[((b(e J )-b(e))K n (e~e 3 )} 



j=p+i 



and 



1 

" j=p+i 



For the so-called variance term N\(e), the basic tool is the exponential inequality 

E U u i 



p 



> e < 2e"^ Ve e]0,3<57d[, 



.32) 



which holds for every set (Ui, . . . , [/„) of independent zero-mean variables such that \Uj\ < d and E(?7 J 2 ) < S 2 
(j = 1, . . . , n). This inequality is easily deduced from Bernstein's one as noticed in [TH], page 17. 
Looking at the independent sequence 

Uj = ±- ((b( ej -) - 6(e))Jf n ( ej - e) - E((6(e,) - 6(e))if w ( ej - e))) , 

firstly, since b and K are bounded, it is clear that 



m < r> 
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and secondly 



E(U?) < 



1 

K 
1 

hn 



{b{u) - b(e)) 2 K 2 



f{u)du 



{b{vh n + e) - b{e)YK\v)f{vh n + e)dv < 



Applying inequality (6.32) with d = S 2 = c/h n we obtain 

P(|iVi(e)| > e) < 2e- i TJ £l 0<e<l 



and then 



P |JVi(e)| >e 



Inn 

nh„ 



< 2e ~c 



A suitable choice of £o yields summability of the r.h.s. and finally, by Borel Cantelli Lemma 

iVi(e) =O a . s . 



Inn 

nh n 



We now turn to the biais term A^(e). From (3.9), 

N 2 (e) = ±- [ (b(u)-b(e))K 



We have thus proved that 



u — e 



f{u)du 



(b(vh n + e) - b(e))K(v)f(vh n + e)dv = 0{hl). 



^i(e)+A^ 2 (e) 



nh. 



O a . 



1 n 



j=p+l 



Inn 

nh n 



0{hl). 



The same rate for sup e (|iV 1 (e) + 7V 2 (e)|) is obtained by covering the domain of e by well chosen intervals and 
using Lipschitz property of the kernel. See [5] and [5] among others for the details. 
• A similar treatment leads to 



sup 

e<E£ 



V" K \ 



EiLo X /(e-sO 



nh n 



T 







.33) 



This, together with the fact that inf eS £ /(e) > 0, leads to 

E™= P+ i 0( e j) - b ( e i)) K n (e/ - ej) 



sup 
I 



< sup 



T,j= P +iKn(ei-ej) 
E"= P+ i(^)-6(e))^„(e- e ,) 



E"= P +i K n (e - ej) 



= O n 



Inn 

nh n 



o(hi) 



and the proof of (6.30) is over. 



Let us now consider the convergence of m; 



(2). 
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Lemma 7. Under the assumptions Hi t 2,3,A> 

= 0a . s . (n?) V 7 > -1/2 

Proof. The vector sequence <frk&(ek)£k is a martingale difference sequence since E(efc) = and since c^efe and 
Ek are independent. Moreover, 

E(||0 fe< 7(e fe ) £fe |||) = a 2 E (b(e k ) 2 ) E(\\<p k f 2 ). 
where IE( 1 1 ^>fe || 1 ) and E (b(ek) 2 ) are periodic. Hence, for every (3 > 1/2 



E 



E{U k a{e k )e k \\ 2 ) 
k 2p 



< 00, 



implying, from theorem 3.3.1 of 



p+1 



Finally, the convergence of S„/n leads to the conclusion. 



□ 



6.3.2 Convergence of the coefficient A n 



~(k— 11 

We prove the convergence of A n , the matrix coefficient of 9„ in (3.15). 

Lemma 8. Under the assumptions 
(i) As n — > oo, 



An 



Ei=p+iVi#n(ej 



where M is defined in LemmaYI\ 
(ii) Moreover \\A n - A\\ = O os {y/%fc) + 0(1%). 



1 ,co*„o-) [ l^ZlM^l3l du=:A 



Proof. We consider first 



Rn(e) 



EL P +i^n(e-ej) 



E? =P +i^n(e- ej ) 



The denominator has been already treated in the proof of Lemma [6] (see (6.33 1), so we focus on the numerator 
and successively show that 



sup 



E"= P+ i %K n (e - ej ) - E(Vj#„ (e - e 3 )) 



nh n 



= O a . s 



Inn 

nh n 



(6.34) 



then, with (jrl' defined in 13.11L 



sup 

eeS 



n%K n ( e - ej )) y0/(e _ s;) 



rc/i„ 



T 



(6.35) 
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The proof of (6.35) uses J K(e)de = 1. The details are omitted. The proof of (6.34 1 follows the lines of the 
proof of (6.301, the difference coming from the fact that the (</)jK n (ej — e))u are not independent. In fact, they 
are weakly dependent in so far as, conditionally to the exogeneous sequence, they are mixing. 

Lemma 9. 

(i) For every e € £ and every h, the vector sequence (<pjK( e3 h e ))j is, conditionally to the sequence {e.j)j ='■ E, 
geometrically ot-mixing. 

(ii) This property holds uniformly with respect to E: there exists a constant C and a €]0, 1[ such that, a E (n) 
being the conditional mixing sequence, 



a E (n) < Ca r 



Vrt. 



Proof. Consider for example the first coordinate K{ ^j^)Xj-i of the vector sequence. Conditionally to E, the 
sequence K(^j^) is deterministic, and it is enough to consider the sequence Xj which has the same conditional 
mixing coefficients as K(^j^-)Xj-\. From ( 6.27| 

X n = y^gj (b(e n -j) + a(e n -j)e n -j) 

is a linear time series based on the bounded noise b{e.j) +a{ej)ej, where b(ej) and a(ej) are deterministic trend 
and variance, while Ej is i.i.d. Let hj(u) be the conditional density of the noise. We obtain, since g is G\ and 
inf ee£ cr(e) > 0, 



\hj(u + x) — hj{u)\du 



< 



< 



1 



a( ej ) 

Nl'oo 



+ x-b{e 3 ) 
a(ej) 



<j{ej) 



f{v)dv 



inf eg£ cr 2 (e) 



Now, the sequence {Xj)j is bounded and, for every j, \gj\ < Cft for a certain (3 G]0, 1[. Hence the theorem in 
[14] applies, with any < 5 < 1: the sequence K ^ 1 , e )Xj_i is conditionally a-mixing, with mixing coefficients 
satisfying 

a E (n) < C (p*F*) =■ CaJ Vn 
where the constant C does not depend on E. □ 

The reader is referred to [4 a for definitions and properties of mixing sequences. Hereafter we need to replace 
inequality (6.32) by the following one, a direct consequence of theorem 6.2 in [2"3"] : 

Lemma 10. Let (Vj) be a strong mixing sequence of centered random variables such that 

a(n) < ca n , Vn and \Vj\ < M, Vj 
Denote = J2i<j,k<n \Cov{Vj, Vu)\. For any r > 1 and A > 0, 



P 



> 4A < 4 1 



-r/2 



4Mcn 



-a Mr 



(6.36) 
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This inequality applies, conditionally to E, to 

Vj =* cp 3 K n (e - ej) - E( l ^K n [e - ej)), j > P + 1. 



For this sequence Vj, the conditional variance s 2 satisfies 



= 0(nh n ) 



(6.37) 



where the O is uniform with respect to E. Indeed, 



Vax E (Vj) < ch n 



\Cov E (V j ,V l )\<h 2 n if \j-l\<S n 



{ |CW(^-,Vi)| <Ca% " if |j-l|>*„ 

For the last bound, the reader can refer to [I]. The two first ones are directly obtained. Taking S n = l/(h n Inn) 
easily leads to (16.37 1 . 



Now, with M :— \\KWooesssupj\Xj\, (6.36) leads to 



> 4A < 4 1 



cA 2 ^ - r/2 



rnh n 



AMCn -A 
; a 



1 1 



and then, if Inn = o(r n ) 



P L 



< 4 1 



> A 



Inn 

n/i„ 



cAglnnA r " /2 , \QMCn A °S^ 



16r„ 



=a 1 

AoVnft,„ Inn 



cAgln„ Cl / n A "^"'"" 

< 4e + " ■ 

A V fin mn 

Now, if /i„ ~ n^ 1 Inn' 32 with (3\ > — 1, r n — (lnn)^ we get, for n large enough, 



P L 



E; =p+ i ^jK n (e - ej) - E(Vi*T« (e - e,-)) 



> A 



Inn 

n/i„ 



1-/3! l±fa !+g2 

2 n 2 *fl" 2 (lure) 2 

< 4n- cA «+C 2 —a, 



(Inn) 



i+ga "l 



< 4n cA ° + C 2 n^~~ 



Now, the constants in (6.38) do not depend on E, implying that 

t, 



P 



n/i„ 



> A 



Inn 

nh r 



i A ln °1 



< 4n cX o + C 2 n^" + "«f * . 
and it is easy to select Aq for the r.h.s. to be the general term of a convergent series. 



(6.38) 
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So we have proved that, for fixed e, 

£?= P +i ^jK n (e - ej) - m^Kn (e - e,-)) 

The same speed is obtained for the sup-norm. 
From flol] ), ( [635] ) and ( jOSf it follows that, with 

R(e) 



O a . 



Inn 



sup 



R n (e)-R(e) 



T,J=o /(e - «i) 

= Oa... 



Inn 



+ 0....(^) 



implying in turn 



1 " 

An = nE" 1 - V" (j)iR n (ei) 

m ^ — ^ 



= 71S- 1 - V ; (i?«(e/)-i?(e / )) + nE- 1 - V faR( 
n e - — ' n z — ' 

i=p+i i=p+i 

= O a . a . (J^f) +O a . s .(hZ)+nZ- 1 ± WW 
In (6.40), the last sum is separated into T sums 

1 n T-l 1 n 



i=p+i 



which almost surely converges to 

T-l 

T 



1=0 kT+l<n 



T-l 



i=0 
T-l 



7 1 



ij=0 



+ ?7o)) 



f{v-Sj)f(v-si) 



(6.39) 



(6.40) 



Moreover, this conver gence rate, being the rate i n the law of large numbers for i.i.d sequences, is faster than 
the first two terms in (6.40). This, together with (6.40) and the almost sure convergence of nE" 1 , leads to the 
desired result. Lemma [8j is proved. 

□ 

Lemma[8| together with Lemma 11 below, shows that the passage (3.15 1 from step k — 1 to step k is a fixed 
point iteration, at least for n large enough. 



Lemma 11. There exists k$>l such that 



Moreover, kg = 1 when p = 1 . 



sup 



\A k °v\\ 2 
IMIa 



< 1. 



.41) 
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Proof. For the sake of simplicity, we take T = 2. The general case only brings more complicated formulas. 
Denoting 

S = r (l) +r (2\ 



and 



f( v - s j)f( v - s i) 

£]=0 /(«-**) 



dv, 



A = [S + ^qMo + MiMi] 1 ("ooMoMo + + aoi(/"oMi + Mi Mo)) • 

We then apply a popular matrix inversion formula: 

[5 + m Mo + M*Mi] 1 Mo : 



i+Vi^+moH'Vi i+Vi^rVi 

Mo So Vo 



where 

This leads to 
and finally to 



1 +Vo[S + /4mo]~Vi I+Vo^qVo 
Si = S + ^l^o et = S + MiMi- 



.4 



So Vo 



I + VoSq mo 



i — (aooVo + aoiVi) + 



srVi 



i + ViSj Vi 



j — (anVi + «oiVo) 



A = a o 



Sq MoMo 



Si ViMi 



i i ct ll - _.. 
Sq VoMi S 1 ViMo 



+ aoi 



where the last line defines the Sji's. 
It is easily checked that 



Si = 



1 + t fM)S Vo l + Vi^xVi 
aoo5*oo + ckuS'ii + aoi(S'oi + Sio) 



-i Sjj — fijjSjji j — 1) 2 



1 + Vj 



o2 



VfcS,. Vj 



1 Sjfc — fijkSjk j 7^ fc 



Clearly, < /3jj < 1. Moreover, 



< 



i'j s , Vj 



i + t HjS j 



f-*j i 



because the first factor is less than 1/2, and 



i +* MfeS 1 V/c 



m/c 



VfcM Vfc 
1 +* AifcM-Vfc 



< 1. 



(6.42) 



(6.43) 
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As a.ji £ [0, 1] for every j, I, it results that 



A 2 = a$Soo + afiSu + afr{(S i + £>io) 



(2) 

where for every j, /, |aj /| < Pjictjj, whence 

A k = a$M 00 + a$Mu + a$(M 01 + M w ) 

where for every j, Z, |a^| < (/Jj-i)* -1 ^/. Lemma is proved. 

It remains to prove ( 3.20[ ), the rate of convergence of the error on the standard deviation. The estimation 
error a ni k(e) is 

cr n ,fc(e) = cr„.k{e) - cr 2 (e) 

Er^+i - ^ _1) (e/)) 2 - <r 2 (e)) «■« (e - e,) 



□ 



YA=Li K n{e- ei) 



Er^+i (a 2 ( ei K-a 2 ( e ))if n ( e - e 



+ Rn,k( e ) 



(6.44) 



W=p+1 



where, from the first part of the theorem, 

R n .k(e) = O a „ 



Inn 

nh-n. 



+ o a . s .(hi) + o a . s Xp k ). 



Now, since the variables a 2 (ei)ef — cr 2 (e) are independent and centered, the first term in (6.441 can be treated 
exactly as was (16. 311), leading to 



ESh {o 2 {.ei)el-° 2 {e))K n (e-e{) 



o a . 



Inn 

nh n 



+ O a .s.(hl) 



and the proof of (3.201 is completed. 
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